Text Structure Aiming at Machine Translation Text Structure Aiming at Machine Translation

نویسنده

  • Horacio Saggion
چکیده

STRUCTURE ASSEMBLER STRUCTURE ASSEMBLER STRUCTURE SPAN Figure 6: Meaning Representation Construction These processes operate on the following components of the meaning representation: Propositions: are produced as a result of syntactic and semantic analysis. Semantic and Syntactic Signals: guide the coherence assembler in the selection of the coherence relations and also in deciding where a coherent span ends. Syntactic signals include discourse markers that directly signal the structure of the discourse [Hirschberg and Litman, 1993]. These markers are the primary indication of the presence of a coherence relation in the text. Tense, aspect and semantic information attached to lexical items provide a means to decide about the limits of a text span [Grosz and Sidner, 1986]. 8 Partial Structure: is used to store propositions and segments already linked and waiting for additional process. When processing a proposition Pk , two problems must be resolved: (a) decide to which text segment the proposition Pk will be attached; (b) decide on how the attachment to a segment will be done. Propositions must be temporarily saved until a decision is made. Coherence Rules: de ne conditions that propositions must satisfy in order to be linked together by a coherence relation. Coherent Span: is a group of propositions related by coherence relations. It carries informational content associated with one of the Informational Categories earlier presented. 5 Detailed Example Figure 7 shows the structure produced as a result of the analysis of the example from Figure 2. The main processes that led to this structure are: ENABLE RECOMENDATIONS ELABORATION OBJETIVES (2) ELABORATION PARALLEL (3) (4) (5) (6) (7) (8) (1) BACKGROUND Figure 7: Text Structure Breaking each sentence into propositions: using syntactic and semantic analysis. Determining references for de nite anaphora: the noun phrase \este trabalho" in proposition (2) is resolved using speci c knowledge about abstracts. The corresponding de nite noun phrase is \this paper". 9 Various entities only make sense in the context of abstracts. These entities include\the authors", \the paper", \the work", \the objective" and the like. This informationis included in the knowledge base system and is very useful when looking for anantecedent for a de nite noun phrase.The noun phrase \este tipo de trocador" in proposition (5) is resolved using thepreceding discourse. The antecedent is \trocadores de calor compactos" introducedin proposition (1). The noun phrase \o m etodo recomendado" is also resolved usingthe previous discourse.Determining the limits of each text span: in proposition (1) the use of the verbal form\s~ao" carries semantical information about general facts (one entity is \de ned").\Is-a" sentences are usually analysed in this form [Sidner, 1978]. So proposition (1) isclassi ed as background. In proposition (2) the verb \apresentar" is used. This verbcarries, in general, purpose or objective information [Jordan, 1991]. Additionally, thenoun phrase \este trabalho", which was found to mean \this paper", is acting assubject in the sentence. Taking into account the fact that a \paper" has an objective,we can deduce that the proposition really marks the objective of the paper. So theObjective category is selected and it spans up to proposition (5). In proposition (6)the item \recomendado" marks the beginning of a new text span which is classi edas Recommendation. Figure 7 shows the limits of each text span.Determining Coherence Relations: syntactic marks guide the selection of coherencerelations. For example propositions from (3) to (5) are linked by coordination, syntac-tically indicated by commas and by the conjunction \e"; this could mark a Parallelor a Sequence relation. But note that the same argument, \este tipo de trocador", isused in the three propositions, which signals a preference for a Parallel relation. Theother CoherenceRelations from the abstract shown in Figure 2 are shown in Figure7.6 ConclusionsTraditional approaches to machine translation have usually neglected the problem of textstructure and the source input was treated as a disconnected sequence of sentences. As aresult, the representation used by these approaches were not able to capture and to makeuse of the coherence phenomena present in the input.We are concentrated on the speci cation and construction of a meaning representationof abstracts from scienti c papers in Portuguese. This representation must capture theinformational content, the coherence relations and the propositional content of the inputtext. We believe that this representation is appropriate for machine translation because itcopes not only with the message which is being conveyed, but also with the structure ofthe text. Representing the linguistic structure of the text enables a generator program tochoose the super cial forms in order to correctly express the message in the target language,preserving the original structure of the text. Several steps are involved in the constructionof such a representation: syntactic analysis, semantic interpretation, anaphora resolution,10 determination of text spans and determination of coherence relations. We are working witha set of these relations, which were de ned according to the phenomena observed in thecorpus. Additional research is needed in order to expand this set to cope with more relations.Also, we have only treated the problem of de nite anaphora through the incorporation ofknowledge about the domain of the discourse into the system; more research is also neededto cope with other kinds of anaphora.AcknowledgementsWe would like to thank Jorge Stol for his valuable comments on a previous version of thispaper.References[ABNT, 1987] ABNT-Associac~ao Brasileira de Normas T ecnicas. Resumos. 1987.[Bazzo and Pereira, 1989] Bazzo, W.A. and Teixeira do Vale Pereira, L. Criatividade naEngenharia. Revista de Ensino de Engenharia, S~ao Paulo, 8(1):8-11, 10 semestre1989.[Danlos, 1987] Danlos, L. The Linguistic Basis of Text Generation. Studies in Natural Lan-guage Processing, Cambridge University Press, 1987.[da Silveira Neto and Hernandez Mendoza, 1988] da Silveira Neto, A. and Hernandez Men-doza, O.S. Trocadores de Calor Compactos Bancada de Testes. Revista de Ensinode Engenharia, S~ao Paulo, 7(1):43-48, 10 semestre 1988.[de Araujo and Szeremeta, 1985] de Araujo, N.D. and Szeremeta, J.F. Uma Experiênciano Ensino de Calculo Numerico na UFSC. Revista de Ensino de Engenharia,4(2):138-139, S~ao Paulo, 20 Sem. 1985.[Gomide and Fernandez, 1985] Gomide, H.A. and Fernandez, E.F. Curso de Similitude emEngenharia. Revista de Ensino de Engenharia, 4(2):125-132, S~ao Paulo, 20 Sem.1985.[Grimes, 1975] Grimes, J. The Thread of Discourse. Mouton and Company, The Hague,Netherlands, 1975.[Grosz and Sidner, 1986] Grosz, B.J. and Sidner, C.L. Attention, Intentions and the Struc-ture of Discourse. Computational Linguistics, Vol. 12, Num. 3, July-September1986.[Halliday and Hasan, 1976] Halliday, M.A. and Hasan, R. Cohesion in English. London,Longman Press, 1976.[Hirst, 1981a] Hirst, G. Anaphora in Natural Language Understanding: A Survey. LectureNotes in Computer Science 119. Springer-Verlag, 1981.11 [Hirst, 1981b] Hirst, G. Discurse-Oriented Anaphora Resolution in Natural Language Un-derstanding: A Review. American Journal of Computational Linguistics, Vol. 7,Num. 2, April-June 1981.[Hirschberg and Litman, 1993] Hirschberg, J. and Litman D. Empirical Studies on the Dis-ambiguation of Cue Phrases. Computational Linguistics, Vol. 19, Num. 3, 1993.[Hobbs, 1978a] Hobbs, J.R. Coherence and Coreference. SRI International. Technical Note168, August 1978.[Hobbs, 1978b] Hobbs, J.R.Why Is Discourse Coherent?. SRI International. Technical Note176, November 1978.[Hutchins, 1985] Hutchins, W.J. Information Retrieval and Text Analysis. In New Ap-proaches to the Analysis of Mass, Media, Discourse and Communication. T.A.van Dijk (Ed.), Gruyter, Berlin, 1985.[Hutchins, 1987] Hutchins, W.J. Summarization: Some Problems and Methods. Meaning:The Frontier of Informatics. K. Jones (Ed.), Cambridge, London, 1987.[Jordan, 1991] Jordan, M.P. The Linguistic Genre of Abstracts. In A. Della Volpe (ed.), TheSeventeenth LACUS Forum. Linguistics Association of Canada and the UnitedStates, 1991.[Lewis, 1992] Lewis, D. Computers and Translation. In Computers and Written Texts.Christopher S. Butler. Blackwell, 1992.[Mann and Thompson, 1983] Mann, W.C. and Thompson S.A. Relational Propositions inDiscourse. Information Sciences Institute, Technical Report RR-83-115, November1983.[Mann and Thompson, 1987] Mann, W.C. and Thompson S.A. Rhetorical Structure The-ory: A Theory of Text Organization. ISI Reprint Series, ISI/RS-87-190, June 1987.[Moore and Paris, 1994] Moore, J.D. and Paris, C.L. Planning Text for Advisory Dia-logues: Capturing Intentional and Rhetorical Information. Computational Lin-guistics, Vol. 19, Num. 4, 1994.[Nirenburg and Carbonell, 1987] Nirenburg, S. and Carbonell, J. Integrating DiscoursePragmatics and Propositional Knowledge for Multilingual Natural Language Pro-cessing. Computers and Translation (2). Paradigm Press, Inc., 1987.[Raposo, 1992] Raposo, E.P. Teoria da Gramatica. A Faculdade da Linguagem. Ed. Cam-inho, Lisboa, 1992.[Scolum, 1985] Scolum, J. A Survey of Machine Translation. Computational Linguistics,Vol. 11, Num. 1, 1985.12 [Sidner, 1978] Sidner, L.S. The Use of Focus as a Tool for Disambiguation of De nite NounPhrases. TINLAP-2, 1978.[Tucker, 1984] Tucker, A.B. A Perspective on Machine Translation: Theory and Practice.Communications of the ACM. Vol. 27. Num 4. April 1984.[Weissberg and Buker, 1990] Weissberg, R. and S. Buker. Writing UP Research. Prentice-Hall, Inc., 1990.[Wilks, 1973] Wilks, Y. An Arti cial Intelligence Approach to Machine Translation. InComputer Models of Thought and Language, Schank, R. and Colby, K. (Eds.),Freeman, San Francisco, 1973.[Winograd, 1983] Winograd, T. Language as a Cognitive Process. Addison-Wesley Publish-ing Company, INC., 1983.13 Relatorios T ecnicos { 199292-01 Applications of Finite Automata Representing Large Vocabularies,C. L. Lucchesi, T. Kowaltowski92-02 Point Set Pattern Matching in d-Dimensions, P. J. de Rezende, D. T. Lee92-03 On the Irrelevance of Edge Orientations on the Acyclic Directed Two Dis-joint Paths Problem, C. L. Lucchesi, M. C. M. T. Giglio92-04 A Note on Primitives for the Manipulation of General Subdivisions andthe Computation of Voronoi Diagrams, W. Jacometti92-05 An (l; u)-Transversal Theorem for Bipartite Graphs, C. L. Lucchesi,D. H. Younger92-06 Implementing Integrity Control in Active Databases, C. B. Medeiros,M. J. Andrade92-07 New Experimental Results For Bipartite Matching, J. C. Setubal92-08 Maintaining Integrity Constraints across Versions in a Database,C. B. Medeiros, G. Jomier, W. Cellary92-09 On Clique-Complete Graphs, C. L. Lucchesi, C. P. Mello, J. L. Szwarc ter92-10 Examples of Informal but Rigorous Correctness Proofs for Tree TraversingAlgorithms, T. Kowaltowski92-11 Debugging Aids for Statechart-Based Systems, V. G. S. Elias, H. Liesenberg92-12 Browsing and Querying in Object-Oriented Databases, J. L. de Oliveira,R. de O. Anido14 Relatorios T ecnicos { 199393-01 Transforming Statecharts into Reactive Systems, Antonio G. Figueiredo Filho,Hans K. E. Liesenberg93-02 The Hierarchical Ring Protocol: An E cient Scheme for Reading Repli-cated Data, Nabor das C. Mendon ca, Ricardo de O. Anido93-03 Matching Algorithms for Bipartite Graphs, Herbert A. Baier Saip, Claudio L.Lucchesi93-04 A lexBFS Algorithm for Proper Interval Graph Recognition, Celina M. H.de Figueiredo, Jo~ao Meidanis, C elia P. de Mello93-05 Sistema Gerenciador de Processamento Cooperativo, Ivonne. M. Carrazana,Nelson. C. Machado, C elio. C. Guimar~aes93-06 Implementac~ao de um Banco de Dados Relacional Dotado de uma InterfaceCooperativa, Nascif A. Abousalh Neto, Ariadne M. B. R. Carvalho93-07 Estadogramas no Desenvolvimento de Interfaces, Fabio N. de Lucena, HansK. E. Liesenberg93-08 Introspection and Projection in Reasoning about Other Agents, JacquesWainer93-09 Codi cac~ao de Sequências de Imagens com Quantizac~ao Vetorial, CarlosAntonio Reinaldo Costa, Paulo L cio de Geus93-10 Minimizac~ao do Consumo de Energia em um Sistema para Aquisic~ao deDados Controlado por Microcomputador, Paulo Cesar Centoducatte, NelsonCastro Machado93-11 An Implementation Structure for RM-OSI/ISO Transaction ProcessingApplication Contexts, Flavio Morais de Assis Silva, Edmundo Roberto MauroMadeira93-12 Boole's conditions of possible experience and reasoning under uncertainty,Pierre Hansen, Brigitte Jaumard, Marcus Poggi de Arag~ao93-13 Modelling Geographic Information Systems using an Object OrientedFramework, Fatima Pires, Claudia Bauzer Medeiros, Ardemiris Barros Silva93-14 Managing Time in Object-Oriented Databases, Lincoln M. Oliveira, ClaudiaBauzer Medeiros93-15 Using Extended Hierarchical Quorum Consensus to Control ReplicatedData: from Traditional Voting to Logical Structures, Nabor das Chagas Men-don ca, Ricardo de Oliveira Anido 15 93-16 LL { An Object Oriented Library Language Reference Manual, TomaszKowaltowski, Evandro Bacarin93-17 Metodologias para Convers~ao de Esquemas em Sistemas de Bancos deDados Heterogêneos, Ronaldo Lopes de Oliveira, Geovane Cayres Magalh~aes93-18 Rule Application in GIS { a Case Study, Claudia Bauzer Medeiros, GeovaneCayres Magalh~aes93-19 Modelamento, Simulac~ao e S ntese com VHDL, Carlos Geraldo Kruger e MarioLucio Côrtes93-20 Re ections on Using Statecharts to Capture Human-Computer InterfaceBehaviour, Fabio Nogueira de Lucena e Hans Liesenberg93-21 Applications of Finite Automata in Debugging Natural Language Vocab-ularies, Tomasz Kowaltowski, Claudio Leonardo Lucchesi e Jorge Stol93-22 Minimization of Binary Automata, Tomasz Kowaltowski, Claudio Leonardo Luc-chesi e Jorge Stol93-23 Rethinking the dna Fragment Assembly Problem, Jo~ao Meidanis93-24 EGOLib | Uma Biblioteca Orientada a Objetos Gra cos, Eduardo AguiarPatroc nio, Pedro Jussieu de Rezende93-25 Compreens~ao de Algoritmos atrav es de Ambientes Dedicados a Animac~ao,Rackel Valadares Amorim, Pedro Jussieu de Rezende93-26 GeoLab: An Environment for Development of Algorithms in ComputationalGeometry, Pedro Jussieu de Rezende, Welson R. Jacometti93-27 A Uni ed Characterization of Chordal, Interval, Indi erence and OtherClasses of Graphs, Jo~ao Meidanis93-28 Programming Dialogue Control of User Interfaces Using Statecharts, FabioNogueira de Lucena e Hans Liesenberg93-29 EGOLib { Manual de Referência, Eduardo Aguiar Patroc nio e Pedro Jussieu deRezende16 Relatorios T ecnicos { 199494-01 A Statechart Engine to Support Implementations of Complex Behaviour,Fabio Nogueira de Lucena, Hans K. E. Liesenberg94-02 Incorporac~ao do Tempo em um sgbd Orientado a Objetos, Ângelo RoncalliAlencar Brayner, Claudia Bauzer Medeiros94-03 O Algoritmo KMP atrav es de Autômatos, Marcus Vin cius A. Andrade eCl audio L. Lucchesi94-04 On Edge-Colouring Indi erence Graphs, Celina M. H. de Figueiredo, Jo~ao Mei-danis, C elia Picinin de Mello94-05 Using Versions in gis, Claudia Bauzer Medeiros and Genevi eve Jomier94-06 Times Ass ncronos: Uma Nova T ecnica para o Flow Shop Problem, H elvioPereira Peixoto e Pedro S ergio de Souza94-07 Interfaces Homem-Computador: Uma Primeira Introduc~ao, Fabio Nogueirade Lucena e Hans K. E. Liesenberg94-08 Reasoning about another agent through empathy, Jacques Wainer94-09 A Prolog morphological analyser for Portuguese, Jacques Wainer, AlexandreFarcic94-10 Introduc~ao aos Estadogramas, Fabio N. de Lucena, Hans K. E. Liesenberg94-11 Matching Covered Graphs and Subdivisions of K4 and C6, Marcelo H. deCarvalho and Claudio L. Lucchesi94-12 Uma Metodologia de Especi cac~ao de Times Ass ncronos, H elvio PereiraPeixoto, Pedro S ergio de Souza17 Relatorios T ecnicos { 199595-01 Paradigmas de algoritmos na solu c~ao de problemas de busca multidimen-sional, Pedro J. de Rezende, Renato Fileto95-02 Adaptive enumeration of implicit surfaces with a ne arithmetic, Luiz Hen-rique de Figueiredo, Jorge Stol95-03 W3 no Ensino de Graduac~ao?, Hans Liesenberg95-04 A greedy method for edge-colouring odd maximum degree doubly chordalgraphs, Celina M. H. de Figueiredo, Jo~ao Meidanis, C elia Picinin de Mello95-05 Protocols for Maintaining Consistency of Replicated Data, Ricardo Anido,N. C. Mendonca95-06 Guaranteeing Full Fault Coverage for UIO-Based Methods, Ricardo Anidoand Ana Cavalli95-07 Xchart-Based Complex Dialogue Development, Fabio Nogueira de Lucena,Hans K.E. Liesenberg95-08 A Direct Manipulation User Interface for Querying Geographic Databases,Juliano Lopes de Oliveira, Claudia Bauzer Medeiros95-09 Bases for the Matching Lattice of Matching Covered Graphs, Claudio L.Lucchesi, Marcelo H. Carvalho95-10 A Highly Recon gurable Neighborhood Image Processor based on Func-tional Programming, Neucimar J. Leite, Marcelo A. de Barros95-11 Processador de Vizinhanca para Filtragem Morfol ogica, Ilka Marinho Barros,Roberto de Alencar Lotufo, Neucimar Jerônimo Leite95-12 Modelos Computacionais para Processamento Digital de Imagens em Ar-quiteturas Paralelas, Neucimar Jerônimo Leite95-13 Modelos de Computac~ao Paralela e Projeto de Algoritmos, Ronaldo Parentede Menezes e Jo~ao Carlos Setubal95-14 Vertex Splitting and Tension-Free Layout, P. Eades, C. F. X. de Mendon ca N.95-15 NP-Hardness Results for Tension-Free Layout, C. F. X. de Mendon ca N., P.Eades, C. L. Lucchesi, J. Meidanis95-16 Agentes Replicantes e Algoritmos de Eco, Marcos J. C. Euz ebio95-17 Anais da II O cina Nacional em Problemas Combinat orios: Teoria, Algo-ritmos e Aplicac~oes, Editores: Marcus Vinicius S. Poggi de Arag~ao, Cid Carvalhode Souza18 95-18 Asynchronous Teams: A Multi-Algorithm Approach for Solving Combi-natorial Multiobjective Optimization Problems, Rosiane de Freitas Rodrigues,Pedro S ergio de Souza95-19 wxWindows: Uma Introduc~ao, Carlos Neves Junior, Tallys Hoover Yunes, FabioNogueira de Lucena, Hans Kurt E. Liesenberg95-20 John von Neumann: Suas Contribuic~oes a Computac~ao, Tomasz Kowaltowski95-21 A Linear Time Algorithm for Binary Phylogeny using PQ-Trees, J. Meidanisand E. G. MunueraDepartamento de Ciência da Computac~ao | IMECCCaixa Postal 6065Universidade Estadual de Campinas13081-970 { Campinas { [email protected] 19

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

Emphasizing Syntax for French to German Machine Translation

This paper tackles the issue of automated translation from French to German, using syntactic analysis to enhance the results of lexical statistics approaches of these last years. It is a non symmetrical method aiming at producing correctly built sentences in the target language, from parsed sentences in source language. The idea is that translation between weakly divergent languages could be op...

متن کامل

A corpus-based translation study on English-Persian verb phrase ellipsis

The present research is a descriptive corpus-based translation study aiming at pinpointing the patterns of translation into Persian when dealing with English Verb Phrase Ellipsis (VPE). After scrutiny of the strategies applied by Persian translators some regular patterns were drawn, with the exception that the observed translation behavior may be taken as advantageous information for improving ...

متن کامل

Identifying bilingual Multi-Word Expressions for Statistical Machine Translation

MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a French-English parallel corpus. In addition we introduce three methods aiming to integrate extracted bilingual MWES in MOSES, a phrase based Statistical Machine...

متن کامل

Learning Transfer Rules for Machine Translation with Limited Data

The transfer-based approach to machine translation (MT) captures structural transfers between the source language and the target language, with the goal of producing grammatical translations. The major drawback of the approach is the development bottleneck, requiring many human-years of rule development. On the other hand, data-driven approaches such as example-based and statistical MT achieve ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995